AITopics | temporal fusion

Accurate and robust 3D object detection is a critical component in autonomous vehicles and robotics. While recent radar-camera fusion methods have made significant progress by fusing information in the bird's-eye view (BEV) representation, they often struggle to effectively capture the motion of dynamic objects, leading to limited performance in real-world scenarios. In this paper, we introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion to address this challenge. Our approach comprises three key modules: Multi-View Fusion (MVF), Motion Feature Estimator (MFE), and Motion Guided Temporal Fusion (MGTF).

artificial intelligence, name change, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.59)

Add feedback

79206ac5b7e88eeeed74997f3b6f4c7f-Supplemental-Conference.pdf

Neural Information Processing SystemsNov-18-2025, 14:27:56 GMT

artificial intelligence, detection, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Query-based Temporal Fusion with Explicit Motion for 3D Object Detection

Neural Information Processing SystemsOct-9-2025, 11:14:22 GMT

Existing methods either conduct temporal fusion based on the dense BEV features or sparse 3D proposal features. However, the former does not pay more attention to foreground objects, leading to more computation costs and sub-optimal performance.

artificial intelligence, detection, machine learning, (13 more...)

Neural Information Processing Systems

Country:

South America > Brazil > Rio de Janeiro > South Atlantic Ocean (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

c3e4035af2a1cde9f21e1ae1951ac80b-Paper.pdf

Neural Information Processing SystemsAug-20-2025, 01:29:38 GMT

artificial intelligence, machine learning, reconstruction, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

Pham, Khanh Son, Witte, Christian, Behley, Jens, Betz, Johannes, Stachniss, Cyrill

arXiv.org Artificial IntelligenceJul-29-2025

-- Most autonomous cars rely on the availability of high-definition (HD) maps. Current research aims to address this constraint by directly predicting HD map elements from onboard sensors and reasoning about the relationships between the predicted map and traffic elements. Despite recent advancements, the coherent online construction of HD maps remains a challenging endeavor, as it necessitates modeling the high complexity of road topologies in a unified and consistent manner . T o address this challenge, we propose a coherent approach to predict lane segments and their corresponding topology, as well as road boundaries, all by leveraging prior map information represented by commonly available standard-definition (SD) maps. We propose a network architecture, which leverages hybrid lane segment encodings comprising prior information and denoising techniques to enhance training stability and performance. Furthermore, we facilitate past frames for temporal consistency. Our experimental evaluation demonstrates that our approach outperforms previous methods by a significant margin, highlighting the benefits of our modeling scheme.

artificial intelligence, computer vision, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.01397

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)

Add feedback

CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection

Neural Information Processing SystemsMay-27-2025, 15:41:53 GMT

Accurate and robust 3D object detection is a critical component in autonomous vehicles and robotics. While recent radar-camera fusion methods have made significant progress by fusing information in the bird's-eye view (BEV) representation, they often struggle to effectively capture the motion of dynamic objects, leading to limited performance in real-world scenarios. In this paper, we introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion to address this challenge. Our approach comprises three key modules: Multi-View Fusion (MVF), Motion Feature Estimator (MFE), and Motion Guided Temporal Fusion (MGTF). The MFE module conducts two simultaneous tasks: estimation of pixel-wise velocity information and BEV segmentation.

artificial intelligence, crt-fusion, temporal fusion, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Robots (0.60)

Add feedback

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

Ye, Zhangchen, Jiang, Tao, Xu, Chenfeng, Li, Yiming, Zhao, Hang

arXiv.org Artificial IntelligenceSep-25-2024

Vision-based 3D occupancy prediction is significantly challenged by the inherent limitations of monocular vision in depth estimation. This paper introduces CVT-Occ, a novel approach that leverages temporal fusion through the geometric correspondence of voxels over time to improve the accuracy of 3D occupancy predictions. By sampling points along the line of sight of each voxel and integrating the features of these points from historical frames, we construct a cost volume feature map that refines current volume features for improved prediction outcomes. Our method takes advantage of parallax cues from historical observations and employs a data-driven approach to learn the cost volume. We validate the effectiveness of CVT-Occ through rigorous experiments on the Occ3D-Waymo dataset, where it outperforms state-of-the-art methods in 3D occupancy prediction with minimal additional computational cost. The code is released at \url{https://github.com/Tsinghua-MARS-Lab/CVT-Occ}.

artificial intelligence, machine learning, occupancy prediction, (15 more...)

arXiv.org Artificial Intelligence

2409.1343

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory

Li, Zhiheng, Cui, Yubo, Zhong, Jiexi, Fang, Zheng

arXiv.org Artificial IntelligenceJul-25-2024

--Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame. However, they often focus on transferring temporal cues in a single inference and regard every prediction as independent of others. This may lead to inconsistent segmentation results for the same object across different frames. T o solve this issue, we propose a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences. Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial priors of moving objects and are used to enhance current inference by temporal fusion. Meanwhile, we build a long-term memory to store previous predictions and exploit them to refine current forecasts at the voxel and instance levels through voting. Besides, we apply multi-view encoder with cascaded projection and asymmetric convolution to extract motion feature of objects in different representations. Extensive experiments validate that our algorithm gets competitive performance on SemanticKITTI and Sipailou Campus datasets. N urban roads, there are often many dynamic objects with variable trajectories, such as vehicles and pedestrians, which create the collision risk for autonomous vehicles. Meanwhile, these moving objects will cause errors in simultaneous localization and mapping (SLAM) [1], as well as pose challenges for obstacle avoidance [2] and path planning [3].

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2407.17905

Country:

Asia > China > Liaoning Province > Shenyang (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.54)
Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.54)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)
(2 more...)

Add feedback

Collaborating Authors

temporal fusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ef0dcb44a47185f5bacac62571f6e920-Paper-Conference.pdf

Domes to Drones: Self-Supervised Active Triangulation for 3D Human Pose Reconstruction

CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection

79206ac5b7e88eeeed74997f3b6f4c7f-Supplemental-Conference.pdf

Query-based Temporal Fusion with Explicit Motion for 3D Object Detection

c3e4035af2a1cde9f21e1ae1951ac80b-Paper.pdf

Coherent Online Road Topology Estimation and Reasoning with Standard-Definition Maps

CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection

CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction

StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory